-
Notifications
You must be signed in to change notification settings - Fork 3.1k
Reasoning support for evaluators #42482
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Add pyrit and not remove the other one
sdk/evaluation/azure-ai-evaluation/azure/ai/evaluation/_evaluators/_coherence/_coherence.py
Outdated
Show resolved
Hide resolved
…tors/_coherence/_coherence.py Co-authored-by: Ankit Singhal <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR adds support for reasoning models to evaluators by introducing an is_reasoning_model
keyword parameter. When set, this parameter updates the evaluator configuration appropriately for reasoning models, enabling better integration with Azure OpenAI's reasoning capabilities.
Key Changes:
- Added
is_reasoning_model
parameter to all evaluators' constructors - Updated
QAEvaluator
to propagate this parameter to child evaluators - Added defensive parameter checking in
GroundednessEvaluator
for backward compatibility - Updated documentation across evaluators to describe the new parameter
Reviewed Changes
Copilot reviewed 11 out of 11 changed files in this pull request and generated 2 comments.
Show a summary per file
File | Description |
---|---|
_similarity/_similarity.py |
Added is_reasoning_model parameter and updated docstrings |
_retrieval/_retrieval.py |
Added is_reasoning_model parameter support |
_response_completeness/_response_completeness.py |
Added is_reasoning_model parameter and improved formatting |
_relevance/_relevance.py |
Added is_reasoning_model parameter support |
_qa/_qa.py |
Updated to propagate is_reasoning_model to child evaluators |
_groundedness/_groundedness.py |
Added parameter support with backward compatibility checks |
_fluency/_fluency.py |
Added is_reasoning_model parameter and updated docstrings |
_base_prompty_eval.py |
Updated to pass is_reasoning_model to AsyncPrompty.load |
_base_multi_eval.py |
Minor import formatting improvement |
_coherence/_coherence.py |
Added is_reasoning_model parameter and updated docstrings |
CHANGELOG.md |
Documented the new feature and bug fix |
You can also share your feedback on Copilot code review for a chance to win a $100 gift card. Take the survey.
...valuation/azure-ai-evaluation/azure/ai/evaluation/_evaluators/_groundedness/_groundedness.py
Outdated
Show resolved
Hide resolved
sdk/evaluation/azure-ai-evaluation/azure/ai/evaluation/_evaluators/_coherence/_coherence.py
Show resolved
Hide resolved
…tors/_groundedness/_groundedness.py Co-authored-by: Copilot <[email protected]>
sdk/evaluation/azure-ai-evaluation/azure/ai/evaluation/_evaluate/_evaluate.py
Outdated
Show resolved
Hide resolved
… evaluate_kwargs pop scope fix
… docstring; tests; add AZEVAL_USE_PROMPTFLOW override.
…; improve reasoning error hints; add tests
…ader_order_debug_sample.py
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
Copilot reviewed 20 out of 20 changed files in this pull request and generated 5 comments.
|
||
if is_reasoning_model: | ||
parameters = configs.get("model", {}).get("parameters", {}) | ||
parameters = configs.get("model", {}).get("parameters") or {} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nitpick] The expression or {}
is redundant since dict.get()
with a default empty dict already handles the None case. Consider simplifying to parameters = configs.get("model", {}).get("parameters", {})
parameters = configs.get("model", {}).get("parameters") or {} | |
parameters = configs.get("model", {}).get("parameters", {}) |
Copilot uses AI. Check for mistakes.
if is_reasoning_model: | ||
# Merge sanitized parameters into the override model dict so PF uses them | ||
base_params = _extract_prompty_parameters(source) | ||
sanitized = _sanitize_reasoning_parameters(base_params) | ||
override_params = dict(model.get("parameters", {}) or {}) | ||
override_params.update(sanitized) | ||
model["parameters"] = override_params | ||
kwargs["model"] = model |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This reasoning model parameter sanitization code is unreachable because it's placed after the early return on line 69 when is_reasoning_model
is True. The logic should be moved before the early return or the condition should be restructured.
Copilot uses AI. Check for mistakes.
_use_run_submitter_client = cast(Optional[bool], evaluate_kwargs.pop("_use_run_submitter_client", None)) | ||
_use_pf_client = cast(Optional[bool], evaluate_kwargs.pop("_use_pf_client", None)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The variable name is incorrect - should be kwargs.pop()
instead of evaluate_kwargs.pop()
since the parameter name is evaluate_kwargs
but the function is operating on kwargs
that was referenced in the original code.
Copilot uses AI. Check for mistakes.
import os, logging | ||
import os | ||
import logging | ||
from inspect import signature |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The signature
import from inspect
is not used in this file and should be removed to avoid unused imports.
from inspect import signature |
Copilot uses AI. Check for mistakes.
# Ensure we do not override prompty temperature/max_tokens in model parameters | ||
# Only extra_headers should be present in parameters added by code | ||
model_cfg = kwargs.get("model") | ||
assert isinstance(model_cfg, dict) | ||
params = model_cfg.get("parameters", {}) | ||
# Our code only sets extra_headers inside parameters; temperature/max_tokens come from prompty |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nitpick] The comment is misleading. The test verifies that temperature and max_tokens are NOT in the parameters dict, but the comment suggests they should be preserved from the prompty file. Consider clarifying that these parameters should be removed for reasoning models.
# Ensure we do not override prompty temperature/max_tokens in model parameters | |
# Only extra_headers should be present in parameters added by code | |
model_cfg = kwargs.get("model") | |
assert isinstance(model_cfg, dict) | |
params = model_cfg.get("parameters", {}) | |
# Our code only sets extra_headers inside parameters; temperature/max_tokens come from prompty | |
# For reasoning models, temperature and max_tokens should be removed from parameters. | |
# Only extra_headers should be present in parameters added by code. | |
model_cfg = kwargs.get("model") | |
assert isinstance(model_cfg, dict) | |
params = model_cfg.get("parameters", {}) | |
# Our code only sets extra_headers inside parameters; temperature/max_tokens are intentionally omitted for reasoning models. |
Copilot uses AI. Check for mistakes.
Description
Please add an informative description that covers that changes made by the pull request and link all relevant issues.
If an SDK is being regenerated based on a new API spec, a link to the pull request containing these API spec changes should be included above.
All SDK Contribution checklist:
General Guidelines and Best Practices
Testing Guidelines